

#### Prof. Dr. Florian Künzner

Technical University of Applied Sciences Rosenheim, Computer Science

CA 10 – Associative memory

The lecture is based on the work and the documents of Prof. Dr. Theodor Tempelmeier

#### **CAMPUS** Rosenheim

**Computer Science** 



# Goal



Computer Science



## Goal

### **CA::Associative memory**

- Memory hierarchy
- Associative memory
- Translation lookaside buffer
- Cache
- Memory protection

#### **CAMPUS** Rosenheim

Computer Science



# Memory hierarchy

### Different kind of memory exists

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded

The more stand alone

#### **CAMPUS** Rosenheim

Computer Science



# Memory hierarchy

### Different kind of memory exists

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded

The more stand alone

#### CAMPUS Rosenheim

Computer Science



# Memory hierarchy

## Different kind of memory exists

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded

The more stand alone

#### **CAMPUS** Rosenheim

Computer Science



# Memory hierarchy

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded
  - the smaller in hardware size
  - the less memory storage is available
  - the faster the memory
  - the more **expensive** in price
- The more stand alone

#### **CAMPUS** Rosenheim

Computer Science



# Memory hierarchy

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded
  - the **smaller** in **hardware** size
  - the less memory storage is available
  - the faster the memory
  - the more expensive in price
- The more stand alone

#### **CAMPUS** Rosenheim

Computer Science



# Memory hierarchy

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded
  - the smaller in hardware size
  - the **less memory storage** is available
  - the faster the memory
  - the more expensive in price
- The more stand alone

Computer Science



# Memory hierarchy

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded
  - the smaller in hardware size
  - the **less memory storage** is available
  - the faster the memory
  - the more expensive in price
- The more stand alone

Computer Science



# Memory hierarchy

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded
  - the smaller in hardware size
  - the **less memory storage** is available
  - the faster the memory
  - the more **expensive** in price
- The more stand alone

#### **CAMPUS** Rosenheim

Computer Science



# Memory hierarchy

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded
  - the smaller in hardware size
  - the **less memory storage** is available
  - the faster the memory
  - the more **expensive** in price
- The more stand alone
  - the bigger in hardware size
  - the more memory storage is available.
  - the slower the memory
  - the cheaper in price

### CAMPUS Rosenheim

Computer Science



# Memory hierarchy

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded
  - the smaller in hardware size
  - the less memory storage is available
  - the faster the memory
  - the more **expensive** in price
- The more stand alone
  - the bigger in hardware size
  - the more memory storage is available
  - the slower the memory
  - the cheaper in price

### CAMPUS Rosenheim

Computer Science



# Memory hierarchy

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded
  - the smaller in hardware size
  - the less memory storage is available
  - the faster the memory
  - the more **expensive** in price
- The more stand alone
  - the bigger in hardware size
  - the more memory storage is available
  - the slower the memory
  - the cheaper in price

#### **CAMPUS** Rosenheim

Computer Science



# Memory hierarchy

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded
  - the smaller in hardware size
  - the **less memory storage** is available
  - the faster the memory
  - the more expensive in price
- The more stand alone
  - the bigger in hardware size
  - the more memory storage is available
  - the slower the memory
  - the cheaper in price

#### CAMPUS Rosenheim

Computer Science



# Memory hierarchy

- On-chip: embedded into IC
- Off-chip: stand alone as a separate hardware
- The more embedded
  - the smaller in hardware size
  - the less memory storage is available
  - the faster the memory
  - the more **expensive** in price
- The more stand alone
  - the bigger in hardware size
  - the more memory storage is available
  - the slower the memory
  - the cheaper in price

Computer Science



# How long does a CPU instruction take?

Consider a modern CPU for a notebook or a workstation (e.g. Intel Core i7/i9). **How long** does a **single instruction take** until it is fully executed?



Computer Science



# How long does a memory access take?

Consider a modern memory module for a notebook or a workstation (e.g. DDR4). **How long** does a **load** or **store** from/to **memory** take?



Computer Science



Memory hierarchy



Figure 1.2: Example of memory hierarchy in an ICT system for the past (a) and for now (b). The speed of the component from bottom to top increases, while the storage volume decreases. Those components are divided into on-chip (embedded with compute circuitry on the same chip) and off-chip (stand-alone as a separate chip).

#### **CAMPUS** Rosenheim

Computer Science



# **Associative memory**

#### **Key (address) Value (information)**

| kovo             | value <sub>0</sub> |
|------------------|--------------------|
| key <sub>0</sub> |                    |
| $key_1$          | value <sub>1</sub> |
| key <sub>2</sub> | value <sub>2</sub> |
| key <sub>3</sub> | value <sub>3</sub> |
| key <sub>4</sub> | value <sub>4</sub> |
| key <sub>5</sub> | value <sub>5</sub> |
|                  |                    |

#### In principle

- A key to value store (comparable to a JAVA hashtable/dictionary)
- Key: address
- Value: some information

#### Properties

- Search for key (address) is done in parallel in hardware!
- Access to information is very fast

#### Usage

- TLB
- Cache

#### **CAMPUS** Rosenheim

Computer Science



# **Associative memory**

#### Key (address) Value (information)

| kovo             | value <sub>0</sub> |
|------------------|--------------------|
| key <sub>0</sub> |                    |
| $key_1$          | value <sub>1</sub> |
| key <sub>2</sub> | value <sub>2</sub> |
| key <sub>3</sub> | value <sub>3</sub> |
| key <sub>4</sub> | value <sub>4</sub> |
| key <sub>5</sub> | value <sub>5</sub> |
|                  |                    |

#### In principle

- A key to value store (comparable to a JAVA hashtable/dictionary)
- Key: address
- Value: some information

#### **Properties**

- Search for key (address) is done in parallel in hardware!
- Access to information is very fast

Usage

- TLB
- Cache

Computer Science



# **Associative memory**

### Key (address) Value (information)

|                  | /                  |
|------------------|--------------------|
| key <sub>0</sub> | value <sub>0</sub> |
| key <sub>1</sub> | value <sub>1</sub> |
| key <sub>2</sub> | value <sub>2</sub> |
| key <sub>3</sub> | value <sub>3</sub> |
| key <sub>4</sub> | value <sub>4</sub> |
| key <sub>5</sub> | value <sub>5</sub> |
|                  |                    |

#### In principle

- A key to value store (comparable to a JAVA hashtable/dictionary)
- Key: address
- Value: some information

#### **Properties**

- Search for key (address) is done in parallel in hardware!
- Access to information is very fast

#### Usage

- TLB
- Cache

### CAMPUS Rosenheim

Computer Science



**TLB** 

# Translation lookaside buffer

Computer Science



## **Address translation**

#### **Procedure**

- Load page table(s)
- 2 Lookup inside page table(s)
- Address translation

#### Problem

- Address translation from: virtual to real address required
- Memory access may be required to obtain real address

All that takes a lot of time—even with the MMU!

Computer Science



## **Address translation**

#### **Procedure**

- Load page table(s)
- 2 Lookup inside page table(s)
- 3 Address translation

#### **Problem**

- Address translation from: virtual to real address required
- Memory access may be required to obtain real address

All that takes a lot of time-even with the MMU!

**Computer Science** 



## **Address translation**

#### **Procedure**

- Load page table(s)
- 2 Lookup inside page table(s)
- Address translation

#### **Problem**

- Address translation from: virtual to real address required
- Memory access may be required to obtain real address

#### All that takes a lot of time-even with the MMU!

Computer Science



## Translation lookaside buffer

Idea: Use an associative memory for address translation from virtual to real addresses: TLB - Translation lookaside buffer



#### Key (virt. adr.) Value (real. adr.)

| <b>5</b> (    | /                     | \                              |
|---------------|-----------------------|--------------------------------|
| virtual_base_ | _address <sub>0</sub> | real_base_address <sub>0</sub> |
| virtual_base_ | _address <sub>1</sub> | real_base_address <sub>1</sub> |
| virtual_base_ | _address <sub>2</sub> | real_base_address <sub>2</sub> |
| virtual_base_ | _address <sub>3</sub> | real_base_address <sub>3</sub> |
| virtual_base_ | _address <sub>4</sub> | real_base_address <sub>4</sub> |
| virtual_base_ | _address <sub>5</sub> | real_base_address <sub>5</sub> |
|               |                       |                                |

virtual\_base\_address: Virtual address without offset
real\_base\_address: Real (frame) address without offset

#### **CAMPUS** Rosenheim

Computer Science



# Translation lookaside buffer

Address translation: virtual address to real address

#### Step 1 (fast way)

- Try to obtain the real address through the TLE
- If the TLB
  - contains the entry: done!
  - doesn't contain the entry: go to step 2!

#### Step 2 (slow way):

- Load page table(s)
- Lookup inside page table(s)
- Address translation
- Store address into TLB

Address translation with TLB always tries step 1 first!

#### CAMPUS Rosenheim

Computer Science



# Translation lookaside buffer

Address translation: virtual address to real address

#### Step 1 (fast way):

- Try to obtain the real address through the TLB
- If the TLB
  - contains the entry: done!
  - doesn't contain the entry: go to step 2!

#### Step 2 (slow way):

- Load page table(s)
- Lookup inside page table(s)
- Address translation
- Store address into TLB

Address translation with TLB always tries step 1 first!

Computer Science



## Translation lookaside buffer

Address translation: virtual address to real address

#### Step 1 (fast way):

- Try to obtain the real address through the TLB
- If the TLB
  - contains the entry: done!
  - doesn't contain the entry: go to step 2!

#### Step 2 (slow way):

- Load page table(s)
- Lookup inside page table(s)
- Address translation
- Store address into TLB

### CAMPUS Rosenheim

**Computer Science** 



# Translation lookaside buffer

Address translation: virtual address to real address

#### Step 1 (fast way):

- Try to obtain the real address through the TLB
- If the TLB
  - contains the entry: done!
  - doesn't contain the entry: go to step 2!

#### Step 2 (slow way):

- Load page table(s)
- Lookup inside page table(s)
- Address translation
- Store address into TLB

#### Address translation with TLB always tries step 1 first!

#### CAMPUS Rosenheim Computer Science



## Cache

# Caches inside the CPU

Computer Science



# Loading of data and instructions

Before the CPU can process data, it must first be loaded from memory into the registers.

#### Problem:

- lacksquare CPU instructions are very fast (< 1 ns)
- Memory access is slow (< 30*ns*)

We should try to bring the data closer to the CPU!

Computer Science



# Loading of data and instructions

Before the CPU can process data, it must first be loaded from memory into the registers.

#### **Problem:**

- $\blacksquare$  CPU instructions are very fast (<1ns)
- Memory access is slow (< 30ns)

We should try to bring the data closer to the CPU!

## CAMPUS Rosenheim Computer Science



# Loading of data and instructions

Before the CPU can process data, it must first be loaded from memory into the registers.

#### **Problem:**

- $\blacksquare$  CPU instructions are very fast (<1ns)
- Memory access is slow (< 30ns)

We should try to bring the data closer to the CPU!

Computer Science



## Cache

**Idea:** Use an associative memory to store data (parts of the main memory) closer to the CPU: the cache!



## Key (real adr.) Value (data)

| real_ | _address <sub>0</sub> | data <sub>0</sub> |   |
|-------|-----------------------|-------------------|---|
| real_ | _address <sub>1</sub> | $data_1$          |   |
| real_ | _address <sub>2</sub> | data <sub>2</sub> |   |
| real_ | _address <sub>3</sub> | data <sub>3</sub> |   |
| real_ | _address <sub>4</sub> | data <sub>4</sub> |   |
| real_ | _address <sub>5</sub> | data <sub>5</sub> |   |
|       |                       |                   | _ |

Computer Science



## Cache

**Idea:** Use an associative memory to store data (parts of the main memory) closer to the CPU: the cache!



| <i>y</i> (                |                   |
|---------------------------|-------------------|
| real_address <sub>0</sub> | data <sub>0</sub> |
| real_address <sub>1</sub> | $data_1$          |
| real_address <sub>2</sub> | data <sub>2</sub> |
| real_address <sub>3</sub> | data <sub>3</sub> |
| real_address <sub>4</sub> | data <sub>4</sub> |
| real_address <sub>5</sub> | data <sub>5</sub> |
| •••                       | ***               |

Computer Science



## Cache

**Idea:** Use an associative memory to store data (parts of the main memory) closer to the CPU: the cache!



| <b>5</b> (                |                   |
|---------------------------|-------------------|
| real_address <sub>0</sub> | data <sub>0</sub> |
| real_address <sub>1</sub> | data <sub>1</sub> |
| real_address <sub>2</sub> | data <sub>2</sub> |
| real_address <sub>3</sub> | data <sub>3</sub> |
| real_address <sub>4</sub> | data <sub>4</sub> |
| real_address <sub>5</sub> | data <sub>5</sub> |
| ***                       |                   |

Computer Science



## Cache

**Idea:** Use an associative memory to store data (parts of the main memory) closer to the CPU: the cache!



| <i>y</i> (                |                   |
|---------------------------|-------------------|
| real_address <sub>0</sub> | data <sub>0</sub> |
| $real\_address_1$         | data <sub>1</sub> |
| real_address <sub>2</sub> | data <sub>2</sub> |
| real_address <sub>3</sub> | data <sub>3</sub> |
| real_address <sub>4</sub> | data <sub>4</sub> |
| real_address <sub>5</sub> | data <sub>5</sub> |
|                           | •••               |

Computer Science



## Cache

**Idea:** Use an associative memory to store data (parts of the main memory) closer to the CPU: the cache!



| <i>y</i> ( )              |                   |
|---------------------------|-------------------|
| real_address <sub>0</sub> | data <sub>0</sub> |
| real_address <sub>1</sub> | data <sub>1</sub> |
| real_address <sub>2</sub> | data <sub>2</sub> |
| real_address <sub>3</sub> | data <sub>3</sub> |
| real_address <sub>4</sub> | data <sub>4</sub> |
| real_address <sub>5</sub> | data <sub>5</sub> |
| ***                       | ***               |

Computer Science



# Cache details (example)

### Given details:

■ 16 bit system

■ Cache line size: 4 bytes

■ Real address: 0x0100

■ Data (for given address): 0x1234

1020102

### Key (real adr.) Value (data: byte 0 to 3)

|       | #0    | #1      | #2 | #3 |
|-------|-------|---------|----|----|
|       |       |         |    |    |
| 0×000 | 0* 12 | 0 x 3 G | ?  | 2. |
|       |       |         |    |    |

This is the view for a BE (big endian) architecture.

Computer Science



# Cache details (example)

### Given details:

■ 16 bit system

Cache line size: 4 bytes

■ Real address: 0x0100

■ Data (for given address): 0x1234

### Key (real adr.) Value (data: byte 0 to 3)

|        | #0   | #1   | #2 | #3 |
|--------|------|------|----|----|
|        |      |      |    |    |
| 0x0100 | 0x12 | 0x34 | ?  | ?  |
|        |      |      |    |    |

This is the view for a BE (big endian) architecture.

### **CAMPUS** Rosenheim

Computer Science



## Cache

### Data access: read/write from/to memory through the cache

### Step 1 (fast way):

- Try to obtain the data through the cache
- If the cache
  - (cache hit) contains the entry: done
  - (cache miss) doesn't contain the entry: go to step 2!

### Step 2 (slow way):

- Load data from memory or store into memory
- Store data into cache

- If new data is stored in the cache, old data may have to be replaced.
- A cache hit rate of at least 90% should be achieved.

Computer Science



## Cache

# Data access: read/write from/to memory through the cache Step 1 (fast way):

- Try to obtain the data through the cache
- If the cache
  - (cache hit) contains the entry: done!
  - (cache miss) doesn't contain the entry: go to step 2!

### Step 2 (slow way):

- Load data from memory or store into memory
- Store data into cache

- If new data is stored in the cache, old data may have to be replaced.
- A cache hit rate of at least 90% should be achieved.

Computer Science



## Cache

Data access: read/write from/to memory through the cache Step 1 (fast way):

- Try to obtain the data through the cache
- If the cache
  - (cache hit) contains the entry: done!
  - (cache miss) doesn't contain the entry: go to step 2!

### Step 2 (slow way):

- Load data from memory or store into memory
- Store data into cache

- If new data is stored in the cache, old data may have to be replaced.
- A cache hit rate of at least 90% should be achieved.

Computer Science



## Cache

# Data access: read/write from/to memory through the cache Step 1 (fast way):

- Try to obtain the data through the cache
- If the cache
  - (cache hit) contains the entry: done!
  - (cache miss) doesn't contain the entry: go to step 2!

### Step 2 (slow way):

- Load data from memory or store into memory
- Store data into cache

- If new data is stored in the cache, old data may have to be replaced.
- A cache hit rate of at least 90% should be achieved.

### **CAMPUS** Rosenheim

**Computer Science** 



# Cache writing strategies

The modified data in the cache have to be written back to the memory at some time.

### Write through

On a write into a word, the data is immediately transferred into cache and the memory.

- On a write into a word, the data is only changed in the cache.
- On the corresponding cache line (entry), the modified bit is set.
- Temporarily, the memory contains invalid data (the old version(s))
- If the cache line (entry) is **invalidated**, the **data is written back** to the memory

### **CAMPUS** Rosenheim

**Computer Science** 



# Cache writing strategies

The modified data in the cache have to be written back to the memory at some time.

### Write through

On a write into a word, the data is immediately transferred into cache and the memory.

- On a write into a word, the data is only changed in the cache.
- On the corresponding cache line (entry), the modified bit is set.
- Temporarily, the memory contains invalid data (the old version(s))
- If the cache line (entry) is invalidated, the data is written back to the memory

### **CAMPUS** Rosenheim

Computer Science



## Cache writing strategies

The modified data in the cache have to be written back to the memory at some time.

### Write through

On a write into a word, the data is immediately transferred into cache and the memory.

- On a write into a word, the data is only changed in the cache.
- On the corresponding cache line (entry), the modified bit is set.
- Temporarily, the memory contains invalid data (the old version(s))
- If the cache line (entry) is invalidated, the data is written back to the memory

### **CAMPUS** Rosenheim

Computer Science



# Cache writing strategies

The modified data in the cache have to be written back to the memory at some time.

### Write through

On a write into a word, the data is immediately transferred into cache and the memory.

- On a write into a word, the data is only changed in the cache.
- On the corresponding cache line (entry), the modified bit is set.
- Temporarily, the memory contains invalid data (the old version(s))
- If the cache line (entry) is invalidated, the data is written back to the memory

### CAMPUS Rosenheim

Computer Science



# Cache writing strategies

The modified data in the cache have to be written back to the memory at some time.

### Write through

On a write into a word, the data is immediately transferred into cache and the memory.

- On a write into a word, the data is only changed in the cache.
- On the corresponding cache line (entry), the modified bit is set.
- Temporarily, the memory contains invalid data (the old version(s))
- If the cache line (entry) is invalidated, the data is written back to the memory

### CAMPUS Rosenheim

Computer Science



## Cache writing strategies

The modified data in the cache have to be written back to the memory at some time.

### Write through

On a write into a word, the data is immediately transferred into cache and the memory.

- On a write into a word, the data is only changed in the cache.
- On the corresponding cache line (entry), the modified bit is set.
- Temporarily, the memory contains invalid data (the old version(s))
- If the cache line (entry) is invalidated, the data is written back to the memory

### **CAMPUS** Rosenheim

**Computer Science** 



## Cache writing strategies

The modified data in the cache have to be written back to the memory at some time.

### Write through

On a write into a word, the data is immediately transferred into cache and the **memory**.

- On a write into a word, the data is only changed in the cache.
- On the corresponding cache line (entry), the **modified bit** is set.
- Temporarily, the **memory contains invalid data** (the old version(s))

Computer Science



## Cache writing strategies

The modified data in the cache have to be written back to the memory at some time.

### Write through

On a write into a word, the data is immediately transferred into cache and the memory.

- On a write into a word, the data is only changed in the cache.
- On the corresponding cache line (entry), the modified bit is set.
- Temporarily, the **memory contains invalid data** (the old version(s))
- If the cache line (entry) is invalidated, the data is written back to the memory

Computer Science



# Intel Core i7 caching

# How works the Intel Core i7 caching hierarchy?

- Multiple caches with different sizes
- Von Neumann architecture with Harvard architecture ideas!

## CAMPUS Rosenheim Computer Science



# Intel Core i7 caching

# How works the Intel Core i7 caching hierarchy?

- Multiple caches with different sizes
- Von Neumann architecture with Harvard architecture ideas!

# Intel Core i7 caching

# How works the Intel Core i7 caching hierarchy?

- Multiple caches with different sizes
- Von Neumann architecture with Harvard architecture ideas!

Computer Science



# Cache example Intel Core i7

### Intel Core i7 7700K:

- **Split** cache: separate cache for data (D) and instructions (I)
- Cache hierarchy with different sizes: L1
- Cache line width 64 bytes

[More infos on cache hierarchy behaviour]

### Cache latency:

- L1(D): 4 cycles
- $\blacksquare$  L1(I): 5 cycles
- L2 : 12 cycles
- L3 : 38 cycles

[source: https://www.7-cpu.com/cpu/www.7-cpu.com]



[simplified schematic view for 1 core]

Computer Science



# Cache example Intel Core i7



[simplified schematic view for 1 core]

### Intel Core i7 7700K:

- **Split** cache: separate cache for data (D) and instructions (I)
- Cache hierarchy with different sizes: L1, L2 and L3
- Cache line width 64 bytes

[More infos on cache hierarchy behaviour]

### Cache latency:

- L1(D): 4 cycles
- $\blacksquare$  L1(I): 5 cycles
- L2 : 12 cycles
- L3 : 38 cycles

Computer Science



# Cache example Intel Core i7



[simplified schematic view for 1 core]

### Intel Core i7 7700K:

- Split cache: separate cache for data (D) and instructions (I)
- Cache hierarchy with different sizes: L1,L2, and L3
- Cache line width 64 bytes

[More infos on cache hierarchy behaviour]

### Cache latency:

- L1(D): 4 cycles
- L1(I): 5 cycles
- L2 : 12 cycles
- L3 : 38 cycles

Computer Science



# Cache example Intel Core i7



[simplified schematic view for  $1\ \mathrm{core}]$ 

### Intel Core i7 7700K:

- Split cache: separate cache for data (D) and instructions (I)
- Cache hierarchy with different sizes: L1, L2, and L3
- Cache **line** width 64 bytes

[More infos on cache hierarchy behaviour]

### Cache latency:

- L1(D): 4 cycles
- $\blacksquare$  L1(I): 5 cycles
- L2 : 12 cycles
- L3 : 38 cycles

### **CAMPUS** Rosenheim

Computer Science



# Cache example Intel Core i7



[simplified schematic view for  $1\ \mathrm{core}]$ 

Prof. Dr. Florian Künzner, SoSe 2022

### Intel Core i7 7700K:

- Split cache: separate cache for data (D) and instructions (I)
- Cache hierarchy with different sizes: L1,L2, and L3
- Cache **line** width 64 bytes

[More infos on cache hierarchy behaviour]

### **Cache latency:**

- L1(D): 4 cycles
- $\blacksquare$  L1(I): 5 cycles
- L2 : 12 cycles
- L3 : 38 cycles

### **CAMPUS** Rosenheim

Computer Science



# Cache example Intel Core i7



[simplified schematic view for  $1\ \mathrm{core}]$ 

### Intel Core i7 7700K:

- Split cache: separate cache for data (D) and instructions (I)
- Cache hierarchy with different sizes: L1,
   L2, and L3
- Cache line width 64 bytes

[More infos on cache hierarchy behaviour]

### **Cache latency:**

- L1(D): 4 cycles
- $\blacksquare$  L1(I): 5 cycles
- L2 : 12 cycles
- L3 : 38 cycles

### **CAMPUS** Rosenheim

**Computer Science** 



# Cache example Intel Core i7



#### [simplified schematic view for 1 core]

Prof. Dr. Florian Künzner, SoSe 2022

### Intel Core i7 7700K:

- **Split** cache: separate cache for data (D) and instructions (I)
- Cache **hierarchy** with different sizes: L1, L2, and L3
- Cache **line** width 64 bytes

[More infos on cache hierarchy behaviour]

### **Cache latency:**

- L1(D): 4 cycles
- L1(I): 5 cycles

### **CAMPUS** Rosenheim

Computer Science



# Cache example Intel Core i7



#### [simplified schematic view for 1 core]

### Intel Core i7 7700K:

- **Split** cache: separate cache for data (D) and instructions (I)
- Cache hierarchy with different sizes: L1,L2, and L3
- Cache **line** width 64 bytes

[More infos on cache hierarchy behaviour]

### **Cache latency:**

■ L1(D): 4 cycles

■ L1(I): 5 cycles

■ L2 : 12 cycles

■ L3 : 38 cycles

### **CAMPUS** Rosenheim

Computer Science



# Cache example Intel Core i7



[simplified schematic view for  $1\ \mathrm{core}]$ 

### Intel Core i7 7700K:

- **Split** cache: separate cache for data (D) and instructions (I)
- Cache hierarchy with different sizes: L1,L2, and L3
- Cache **line** width 64 bytes

[More infos on cache hierarchy behaviour]

### **Cache latency:**

■ L1(D): 4 cycles

■ L1(I): 5 cycles

■ L2 : 12 cycles

■ L3 : 38 cycles

### **CAMPUS** Rosenheim

**Computer Science** 



## Memory protection

# How to protect the memory?

### **CAMPUS** Rosenheim

Computer Science



## Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

### CAMPUS Rosenheim

Computer Science



## Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

Computer Science



## Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

**Computer Science** 



## Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

Computer Science



Slide 22 of 25

## Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

**Computer Science** 



# Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

#### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

Computer Science



# Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

#### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

Computer Science



# Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

#### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

Computer Science



# Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

#### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

Computer Science



## Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

#### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

Computer Science



# Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

#### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

Computer Science



## Memory protection unit

A memory protection unit (MPU) is a smaller version of a MMU that only contains the memory protection support.

- Privileged software can define the memory regions and its attributes.
- If an access violation is detected by the MPU a fault exception is triggered.

#### **Properties**

- Memory region: A fixed base address and a fixed size
- Memory attributes: shared, cached, ...
- Access rights: read, write, execute

#### A MPU can be used for

- Increased security during code execution
- Different privilege levels in an application
- Strict separation of code, data and stack (also between different tasks)

[see: ARM Cortex M3 - MPU, MPU peripheral]

Computer Science



# Memory protection

### Memory protection with virtual memory and the MMU

For each page the following information is saved

- R/W = read/write
- RO = read only
- E0 = execute only
- U/S = user/supervisor

This is a basis for memory protection.

A **process** can only access memory through the virtual memory mechanism (MMU) and **can** therefore **only access memory assigned** by the OS.

Computer Science



# Memory protection

### Memory protection with virtual memory and the MMU

For each page the following information is saved

- R/W = read/write
- RO = read only
- EO = execute only
- U/S = user/supervisor

This is a basis for memory protection.

A **process** can only access memory through the virtual memory mechanism (MMU) and **can** therefore **only access memory assigned** by the OS.

Goal Memory hierarchy Associative memory Translation lookaside buffer Cache Intel Core i7 caching Memory protection Summary

#### **CAMPUS** Rosenheim

Computer Science



# Memory protection

### Memory protection with virtual memory and the MMU

For each page the following information is saved

- R/W = read/write
- RO = read only
- EO = execute only
- U/S = user/supervisor

This is a basis for memory protection.

A **process** can only access memory through the virtual memory mechanism (MMU) and **can** therefore **only access memory assigned** by the OS.

Computer Science



## Memory protection

### Memory protection with virtual memory and the MMU

For each page the following information is saved

- R/W = read/write
- RO = read only
- EO = execute only
- U/S = user/supervisor

This is a basis for memory protection.

A **process** can only access memory through the virtual memory mechanism (MMU) and **can** therefore **only access memory assigned** by the OS.

Computer Science





Computer Science





**Computer Science** 





Computer Science



# Memory protection



Prof. Dr. Florian Künzner, SoSe 2022

Computer Science





Computer Science



# Memory protection



[source: Is hardware the black hole of computing?]

Computer Science



# Memory protection



[source: Is hardware the black hole of computing?]

access, turn computer on/off (MINIX 3)

Goal Memory hierarchy Associative memory Translation lookaside buffer Cache Intel Core i7 caching Memory protection Summary

#### **CAMPUS** Rosenheim

Computer Science



# Summary and outlook

## Summary

- Memory hierarchy
- Associative memory
- Translation lookaside buffer
- Cache
- Memory protection

#### Outlook

Bus and I/O

Goal Memory hierarchy Associative memory Translation lookaside buffer Cache Intel Core i7 caching Memory protection Summary

#### **CAMPUS** Rosenheim

**Computer Science** 



# Summary and outlook

## **Summary**

- Memory hierarchy
- Associative memory
- Translation lookaside buffer
- Cache
- Memory protection

#### Outlook

■ Bus and I/O